Crowdsourcing-based Annotation of Emotions in Filipino and English Tweets

نویسندگان

Fermin Roberto Lapitan

Riza Theresa Batista-Navarro

Eliezer A. Albacea

چکیده

The automatic analysis of emotions conveyed in social media content, e.g., tweets, has many beneficial applications. In the Philippines, one of the most disaster-prone countries in the world, such methods could potentially enable first responders to make timely decisions despite the risk of data deluge. However, recognising emotions expressed in Philippine-generated tweets, which are mostly written in Filipino, English or a mix of both, is a non-trivial task. In order to facilitate the development of natural language processing (NLP) methods that will automate such type of analysis, we have built a corpus of tweets whose predominant emotions have been manually annotated by means of crowdsourcing. Defining measures ensuring that only high-quality annotations were retained, we have produced a gold standard corpus of 1,146 emotion-labelled Filipino and English tweets. We validate the value of this manually produced resource by demonstrating that an automatic emotion-prediction method based on the use of a publicly available word-emotion association lexicon was unable to reproduce the labels assigned via crowdsourcing. While we are planning to make a few extensions to the corpus in the near future, its current version has been made publicly available in order to foster the development of emotion analysis methods based on advanced Filipino and English NLP.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

3arif: A Corpus of Modern Standard and Egyptian Arabic Tweets Annotated for Epistemic Modality Using Interactive Crowdsourcing

We present 3arif, a large-scale corpus of Modern Standard and Egyptian Arabic tweets annotated for epistemic modality. To create 3arif, we design an interactive crowdsourcing annotation procedure that splits up the annotation process into a series of simplified questions, dispenses with the requirement for expert linguistic knowledge and captures nested modality triggers and their attributes se...

متن کامل

THE JOHNS HOPKINS UNIVERSITY Nerit: Named Entity Recognition for Informal Text

We describe a multilingual named entity recognition system using language independent feature templates, designed for processing short, informal media arising from Twitter and other microblogging services. We crowdsource the annotation of tens of thousands of English and Spanish tweets and present classification results on this resource.

متن کامل

The NewSoMe Corpus: A Unifying Opinion Annotation Framework across Genres and in Multiple Languages

We present the NewSoMe (News and Social Media) Corpus, a set of subcorpora with annotations on opinion expressions across genres (news reports, blogs, product reviews and tweets) and covering multiple languages (English, Spanish, Catalan and Portuguese). NewSoMe is the result of an effort to increase the opinion corpus resources available in languages other than English, and to build a unifying...

متن کامل

An Extended Study of Content and Crowdsourcing-related Performance Factors in Named Entity Annotation

Hybrid annotation techniques have emerged as a promising approach to carry out named entity recognition on noisy microposts. In this paper, we identify a set of content and crowdsourcing-related features (number and type of entities in a post, average length and sentiment of tweets, composition of skipped tweets, average time spent to complete the tasks, and interaction with the user interface)...

متن کامل

Exposing a Set of Fine-Grained Emotion Categories from Tweets

An important starting point in analyzing emotions on Twitter is the identification of a set of suitable emotion classes representative of the range of emotions expressed on Twitter. This paper first presents a set of 48 emotion categories discovered inductively from 5,553 annotated tweets through a small-scale content analysis by trained or expert annotators. We then refine the emotion categori...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Crowdsourcing-based Annotation of Emotions in Filipino and English Tweets

نویسندگان

چکیده

منابع مشابه

3arif: A Corpus of Modern Standard and Egyptian Arabic Tweets Annotated for Epistemic Modality Using Interactive Crowdsourcing

THE JOHNS HOPKINS UNIVERSITY Nerit: Named Entity Recognition for Informal Text

The NewSoMe Corpus: A Unifying Opinion Annotation Framework across Genres and in Multiple Languages

An Extended Study of Content and Crowdsourcing-related Performance Factors in Named Entity Annotation

Exposing a Set of Fine-Grained Emotion Categories from Tweets

عنوان ژورنال:

اشتراک گذاری